As before, import libraries we'll need...
In [1]:
from __future__ import unicode_literals, print_function
import boto3
import json
import numpy as np
import pandas as pd
import spacy
...and instantiate Verta's ModelDB Client.
In [2]:
from verta import Client
client = Client('https://app.verta.ai')
proj = client.set_project('Tweet Classification')
expt = client.set_experiment('SpaCy')
Let's say someone has provided us with a new, expermental dataset that supposedly will improve our model. Unbeknownst to everyone, this dataset actually only contains one of the two classes we're interested in. This is going to hurt our performance, but we don't know it yet.
Before, we trained a model on english-tweets.csv
. Now, we're going to train with positive-english-tweets.csv
.
In [3]:
S3_BUCKET = "verta-starter"
S3_KEY = "positive-english-tweets.csv"
FILENAME = S3_KEY
boto3.client('s3').download_file(S3_BUCKET, S3_KEY, FILENAME)
In [4]:
import utils
data = pd.read_csv(FILENAME).sample(frac=1).reset_index(drop=True)
utils.clean_data(data)
data.head()
Out[4]:
As with before, we'll capture and log our model ingredients directly onto our repository's master
branch.
In [5]:
from verta.code import Notebook
from verta.configuration import Hyperparameters
from verta.dataset import S3
from verta.environment import Python
code_ver = Notebook() # Notebook & git environment
config_ver = Hyperparameters({'n_iter': 20})
dataset_ver = S3("s3://{}/{}".format(S3_BUCKET, S3_KEY))
env_ver = Python() # pip environment and Python version
In [6]:
repo = client.set_repository('Tweet Classification')
commit = repo.get_commit(branch='master')
In [7]:
commit.update("notebooks/tweet-analysis", code_ver)
commit.update("config/hyperparams", config_ver)
commit.update("data/tweets", dataset_ver)
commit.update("env/python", env_ver)
commit.save("Update tweet dataset")
commit
Out[7]:
You may verify through the Web App that this commit updates the dataset, as well as the Notebook.
Again as before, we'll train the model and log it along with the commit to an Experiment Run.
In [8]:
nlp = spacy.load('en_core_web_sm')
In [9]:
import training
training.train(nlp, data, n_iter=20)
In [10]:
run = client.set_experiment_run()
run.log_model(nlp)
In [11]:
run.log_commit(
commit,
{
'notebook': "notebooks/tweet-analysis",
'hyperparameters': "config/hyperparams",
'training_data': "data/tweets",
'python_env': "env/python",
},
)
Looking back over our workflow, we might notice that there's something suspicious about the model's precision, recall, and F-score. This model isn't performing as it should, and we don't want it to be the latest commit in master
. Using the Client, we'll revert the commit.
In [12]:
commit
Out[12]:
In [13]:
commit.revert()
commit
Out[13]:
As easy as that—we have a new commit on master
that reverted our grave mistake.
Again, the Web App will show that the change from english-tweets.csv
to positive-english-tweets.csv
has been undone.